10 research outputs found
Efficient regularized isotonic regression with application to gene--gene interaction search
Isotonic regression is a nonparametric approach for fitting monotonic models
to data that has been widely studied from both theoretical and practical
perspectives. However, this approach encounters computational and statistical
overfitting issues in higher dimensions. To address both concerns, we present
an algorithm, which we term Isotonic Recursive Partitioning (IRP), for isotonic
regression based on recursively partitioning the covariate space through
solution of progressively smaller "best cut" subproblems. This creates a
regularized sequence of isotonic models of increasing model complexity that
converges to the global isotonic regression solution. The models along the
sequence are often more accurate than the unregularized isotonic regression
model because of the complexity control they offer. We quantify this complexity
control through estimation of degrees of freedom along the path. Success of the
regularized models in prediction and IRPs favorable computational properties
are demonstrated through a series of simulated and real data experiments. We
discuss application of IRP to the problem of searching for gene--gene
interactions and epistasis, and demonstrate it on data from genome-wide
association studies of three common diseases.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS504 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Privacy and Fairness in Recommender Systems via Adversarial Training of User Representations
Latent factor models for recommender systems represent users and items as low
dimensional vectors. Privacy risks of such systems have previously been studied
mostly in the context of recovery of personal information in the form of usage
records from the training data. However, the user representations themselves
may be used together with external data to recover private user information
such as gender and age. In this paper we show that user vectors calculated by a
common recommender system can be exploited in this way. We propose the
privacy-adversarial framework to eliminate such leakage of private information,
and study the trade-off between recommender performance and leakage both
theoretically and empirically using a benchmark dataset. An advantage of the
proposed method is that it also helps guarantee fairness of results, since all
implicit knowledge of a set of attributes is scrubbed from the representations
used by the model, and thus can't enter into the decision making. We discuss
further applications of this method towards the generation of deeper and more
insightful recommendations.Comment: International Conference on Pattern Recognition and Method
Low Communication Complexity Protocols, Collision Resistant Hash Functions and Secret Key-Agreement Protocols
We study communication complexity in computational settings where bad inputs may exist, but they should be hard to find for any computationally bounded adversary.
We define a model where there is a source of public randomness but the inputs are chosen by a computationally bounded adversarial participant after seeing the public randomness. We show that breaking the known communication lower bounds of the private coins model in this setting is closely connected to known cryptographic assumptions. We consider the simultaneous messages model and the interactive communication model and show that for any non trivial predicate (with no redundant rows, such as equality):
1. Breaking the bound in the simultaneous message case or the bound in the interactive communication case, implies the existence of distributional collision-resistant hash functions (dCRH). This is shown using techniques from Babai and Kimmel (CCC \u2797). Note that with a CRH the lower bounds can be broken.
2. There are no protocols of constant communication in this preset randomness settings (unlike the plain public randomness model).
The other model we study is that of a stateful ``free talk , where participants can communicate freely before the inputs are chosen and may maintain a state, and the communication complexity is measured only afterwards. We show that efficient protocols for equality in this model imply secret key-agreement protocols in a constructive manner. On the other hand, secret key-agreement protocols imply optimal (in terms of error) protocols for equality
Privacy-Preserving Decision Tree Training and Prediction against Malicious Server
Privacy-preserving machine learning enables secure outsourcing of machine learning tasks to an untrusted service provider (server) while preserving the privacy of the user\u27s data (client). Attaining good concrete efficiency for complicated machine learning tasks, such as training decision trees, is one of the challenges in this area. Prior works on privacy-preserving decision trees required the parties to have comparable computational resources, and instructed the client to perform computation proportional to the complexity of the entire task.
In this work we present new protocols for privacy-preserving decision trees, for both training and prediction, achieving the following desirable properties:
1. Efficiency: the client\u27s complexity is independent of the training-set size during training, and of the tree size during prediction.
2. Security: privacy holds against malicious servers.
3. Practical usability: high accuracy, fast prediction, and feasible training demonstrated on standard UCI datasets, encrypted with fully homomorphic encryption.
To the best of our knowledge, our protocols are the first to offer all these properties simultaneously.
The core of our work consists of two technical contributions. First, a new low-degree polynomial approximation for functions, leading to faster protocols for training and prediction on encrypted data. Second, a design of an easy-to-use mechanism for proving privacy against malicious adversaries that is suitable for a wide family of protocols, and in particular, our protocols; this mechanism could be of independent interest
CHIP and CRISP: Protecting All Parties Against Compromise through Identity-Binding PAKEs
Recent advances in password-based key exchange (PAKE) protocols can offer stronger security guarantees for globally deployed security protocols. Notably, the OPAQUE protocol realizes saPAKE [Eurocrypt2018], strengthening the protection offered by aPAKE to compromised servers: after compromising an saPAKE server, the adversary still has to perform a full brute-force search to recover any passwords or impersonate users. However, (s)aPAKEs do not protect client storage, and can only be applied in the so-called asymmetric setting, in which some parties, such as servers, do not communicate with each other.
Nonetheless, passwords are also widely used in symmetric settings, where a group of parties share a password and can all communicate (e.g., Wi-Fi with client devices, routers, and mesh nodes; or industrial IoT scenarios). In these settings, the (s)aPAKE techniques cannot be applied, and the state-of-the-art still involves handling plaintext passwords.
In this work, we propose the notions of (strong) identity-binding PAKEs that improve this situation in two dimensions: they protect all parties from compromise, and can also be applied in the symmetric setting. We propose stronger counterparts to state-of-the-art security notions from the asymmetric setting in the UC model, and construct protocols that provably realize them. Our constructions bind the local storage of all parties to abstract identities, building on ideas from identity-based key exchange, but without requiring a third party.
Our first protocol, CHIP, generalizes the security of aPAKE protocols to all parties, forcing the adversary to perform a brute-force search to recover passwords or impersonate others. Our second protocol, CRISP, additionally renders any adversarial pre-computation useless, thereby offering saPAKE-like guarantees for all parties, instead of only the server.
We evaluate prototype implementations of our protocols and show that even though they offer stronger security, their performance is in line with, or even better than, state-of-the-art protocols
The longitudinal structure of negative symptoms in treatment resistant schizophrenia
Background and hypothesis: The negative symptoms of schizophrenia are strong prognostic factors but remain poorly understood and treated. Five negative symptom domains are frequently clustered into the motivation and pleasure (MAP) and emotional expression (EE) ‘dimensions’, but whether this structure remains stable and behaves as a single entity or not remains unclear. Study design: We examined a cohort of 153 patients taking clozapine for treatment-resistant schizophrenia in a regional mental health clinic. Patients were assessed longitudinally over a mean period of 45 months using validated scales for positive, negative and mood symptoms. Network analyses were performed to identify symptom ‘communities’ and their stability over time. The influence of common causes of secondary negative symptoms as well as centrality measures were also examined. Study results: Across patients at baseline, two distinct communities matching the clinical domains of MAP and EE were found. These communities remained highly stable and independent over time. The communities remained stabled when considering psychosis, depression, and sedation severity, and these causes of secondary negative symptoms were clustered into the MAP community. Centrality measures also remained stable over time, with similar centrality measures across symptoms. Conclusions: Our results suggest that MAP and EE are independent dimensions that remain highly stable over time in chronic schizophrenia patients treated with clozapine. Common causes of secondary negative symptoms mapped onto the MAP dimension. Our results emphasise the need for clinical trials to address either MAP or EE, and that treating causes of secondary negative symptoms may improve MAP